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Chinese calligraphy 
is exquisite and eloquent 
hut it causes major 
problems for 
today's computers 

. BY ROBERT J. TROTTER 


How do you take a language with 50,000 
characters and make it compatible with a 
conventional computer keyl»ard of 50 
characters so that a country with one bil¬ 
lion people can catch up with 20th century 
technology? 

The problem, of course, is the Chinese 
written language Its characters are beau¬ 
tiful, but there are just too many of them to 
be handled efficiently by existing com¬ 
puters. The problem began 6,000 years ago 
when the Chinese language originated 
with pictures scratched on bones and 
spells. These pictographs gradually 
evolved into ideographs (syrr bols used to 
represent words and ideas; and finally 
into the'current square Han characters. 
Over the years, Chinese writing followed a 
pattern of divergence that led to the even¬ 
tual, development of 50,000 characters. 
Indo-European languages, on the other 
hand, followed a pattern of convergence 
Hundreds of. ancient hieroglyphs were 
eventually funneled by the Egyptians, He¬ 
brews. Greeks and Romans into the 26 
standard characters of the Latin alphabet 
loday. ld.000 Han characters are in circu¬ 
lation and more are being coined. To be 
"newspaper literate” one hTust know at 
feast 2,000 characters; to be college- 
educated. 5,000. 

As beautiful and as time-honored as the 
Han characters may be. they do present 
certain obstacles to a technological soci¬ 
ety. In the 1920s, for instance the Chinese 
intellectual Lu Xun complained that 
; Chinese characters are too complicated to 
educate the masses. Students, he said, 
spend too much time learning to write Han ^ 
•characters and practicing calligraphy—at 
thle expense of their education in science, 
and mathematics. For a country that If 
trying to upgrade itself technologically, 
this is serious. Chinese typewriting and 
typesetting, for example, arc complicated, 
expensive and time-consuming. But per¬ 
haps the greatest problem is with com¬ 
puters. Fortunately, computers may also 
hold the eventual solution to that prob- / 
lem. 

The best answer would be an optic 
scanner or some other device that could 
read the 10,000 Chinese characters (or Ko¬ 
rean or Japanese, which are based on 
Chinese). But this is not possible with cur¬ 
rent computer technology, so linguistics 
and computer experts have been working 
on vai^ous complex keyboard arrange¬ 
ments and coding schemes that allow the 
Chinese. Koreans and Japanese to use 
computers in their own languages. 

One example is the ibm information 
processing system developed for Japan 
and Taiwan. After ten years of R&D. ibm 
researchers developed a system that can 
do everything that can be done with 
English-language equipment. Software is 
. the key to.its success, explains ibm’s 
Charles Swift. Conventional data process¬ 
ing uses one byte, or eight bits, of coded 
information. But that yields only 256 
character combinations. This system^pan 


handle thou.* ands of characters because it 
processes things in terms of two bytes 
instead of one. A special keyboard and a 
character generator were the other neces¬ 
sary elemen s in its design. 

The keyboard is large—large enough to 
hold more tl an 2,000 characters. Each key 
has 12 characters on it. and at the side 
there is an additional 12-digit keyboard. If 
the operator presses key number 12. for 
example, th»-n selects a key on the large 
board, the 12th character on that key is 
displayed. It takes an operator about six 
weeks of trai ning to get to a top speed of 60 
to 75 characters per minute. Top speed on 
a Japanese typewriter is about 35 charac¬ 
ters per minjte. 

High-resolution allows for accurate 
video display of the complex characters, 
and an ink jet printer can produce 37 
characters >er second at the terminal. 
Laser techrology allows the system to 
print 10,000 ines per minute. 

Wang Laboratories of Lowell, Mass., has 
taken a different approach to the Oriental 
character problem. Last yeaMhey began 
marketing the Ideographic Word Process¬ 
ing System that can operate in con¬ 
ventional (Mandarin) Chinese, simplified 
Chinese or Japanese. 

Instead of displaying thousands of 
characters on a keyboard, the Wang sys¬ 
tem uses a Oding system. In this way. a 
minimum number of keys can be used to 
generate 10. XX) characters. Each charac¬ 
ter has a six-digit identification number 
based on the shape of the character. Ac¬ 
cording to ’he company. “Users familiar 
with the 5f>slc shape and structure of 
Chinese characters can be trained to use 
the method ; called the three comer cod¬ 
ing method' quickly and easily. In fact, an 
operator need use only 297 character ele- 
mentsand 15 rules to be fullY proficient on 
the entire :.ystem."The system can be 
learned in two weeks. If offers editing 
capabilities such as insertion, replace- 
mtflt and deletion q! characters, lines, 
paragraphs or entire sections of text. 
Standard disk storage can range up to 
l375«MlliopK:haracters. 

fimins complicated system for entering 
Chinese characters may be the one devel¬ 
oped at Cornell University by Paul L. King, 
with grant money from the National Cash 
Register Corp. King says a Chinese-^ 
speaking person With the equivalent m a 
junior high school education can learn to 
operate thi* system in about half an hour 
and enter rnaracters at a rate of 50 per 
minute — nearly five times as fast as a 
highly ski led person can operate a 
Chinese typewriter. The system is easy to 
operate because it uses a 12-digit 
keyboard to enter the thousands of 
characters. Each digit describes a basic 
shape used n Chinese characters in one of 
the four qjadrants into which all the 
characters .ire divided. By selecting up to 
four keys, an operator can identify an en¬ 
tire character. Because of the complexity 
of the characters, however, the same four 
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digits can sometimes produce ten or more 
characters — ail similar enough to have 
been described by the same digits but very 
different in meaning. When this happens, 
the system uses linguistic rules to auto¬ 
matically select the correct character. And 
if the automatic selection process is not 
specific enough, the computer displays 
the remaining choices and the operator 
makes a manual selection. The system 
contains about 2.500 words, and addi¬ 
tional sets of 500 special vocabulary 
words are being developed. 

And if an operator doesn't want to go to 
the trouble of learning one of these sys¬ 
tems, there may eventually be a system for 
on-line recognition of handwritten 
Chinese characters. E:F. Yhap and E.C. 
Greanias of the ibm Thomas J Watson Re¬ 
search Center in Yorktown Heights. N. Y„ 
describe this still experimental process in 
the May ibm Journal of Research and 
Development. It consists of a specially 
designed tablet that produces a pattern of 
varying electromagnetic signals that are 
picked up by the pen and then fed through 
a five-part recognition program. Prelimi¬ 
nary tests of the system, which recognizes 
*2,249 symbols, have found it to have a 
"recognition rate of 97.8 percent. 

“On-line tablet recognition." say the re¬ 
searchers. "offers considerable promise as 
a natural data entry device for casual 
users of information systems." 

ftrC^Tian offers what he considers to be 
an even more natural data entry system. 
Instead of working with character- 
recognition systems, he wants to com¬ 
puterize Chinas language by alphabetiz¬ 
ing it in such a way that it will work with 
existing. Latin alphabet computers — and 
after 20 years, he thinks he's done it. 

Tien, an M.D., is editor of the Michigan 
Institute for Psychosynthesis in Lansing. 
The institute's goal, he says, is "the union 
of Eastern and Western medicines, 
thoughts and languages " And the quickest 
way of doing this, he believes, is by al¬ 
phabetizing the Chinese language. The 
system he developed, he says, “has the 
power to transcribe common spoken 
Chinese (Putonghua) and to facilitate and 
accelerate printing, typing, telegraphy, 
computer input-output, indexing, library 
cataloging, scientific reprints —’ 

Tien is following a long tradition of at¬ 
tempts to reform and simplify the Chinese 
language. The most recent attempts in¬ 
clude the simplification of Han characters, 
the teaching of Putonghua (the Beijing 
dialect) throughout the country and the 
introduction in 1958 of the Pinyin System 
—a Chinese phonetic alphabet. 

Although based on the Latin alphabet, 
the Pinyin system resists computerization 
because of the many homophones (words 
that sound the same but have different 
meanings) in the Chinese language. Many 
Han characters, for example, are spelled 
the same way in Pinyin. When spoken, 
these words are differentiated by tone. 
When written in Pinyin. they are differ- 
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entfeted by diacritical marks (an extra 
burde[fro computers). But there are only 
four tonalcdiacritical marks in Pinyin. 
and in som Cases these marks have to 
separate more than four similar-sounding 
characters. Tie word “ma," for instance, 
has 18 homo >hones; 'If has at least 81. and 
“yi* has 126. ' T 

In order :o eliminate the diacritical 
marks and solve the homophone problem. 
Tien uses a Istter-doubling technique and 
a set of 189 suffixes. Most Chinese syllables 
consist of a consonant and a vowel, so 
letter doub ing can yield four distinct 
spellings (b-t, bba, bbaa and*baa), which 
correspond o the four tones. 

“We are n< >w one step closer in the com¬ 
puterization of Chinese characters." says 
Tien, "but another obstacle'remains." Most 
Chinese cha- acters consist of two parts, or 
radicals. On; is spoken, the other is silent 
(it may signi y such things as the gender of 
the word). A character pronounced ba, for 
example, m.iy be made up of the radical 
pronounced ba and one of a number of 
silent radicj Is that change the meaning of 
the word. Ti**n uses silent suffixes made up 
of letters or letter combinations to repre¬ 
sent these s lent Han radicals. 

This, ip e tsence. is what Tien calls the 
Pinxxiee $y- tern. It includes the phonetic 
syllables of the Pinyin System (which is 
being learn ?d by all school children in 
China), the etter-doubiing system for the 
Jones and si ent suffixes for the silent radi¬ 
cals. "It is ol vious." he asserts, “that every 
character) nay now be uniquely and 
equivalent!} transformed for programing 
into the cot tputer" The Pinxxiee system, 
he continue;, can lead to the alphabetiza¬ 
tion of/fell 4an characters with existing 
computer t« chnology without waiting for 
further bev ‘lopment of pattern recogni¬ 
tion or ima>e processing techniques and 
equipment.^ 

There is, lowever, still one major prob¬ 
lem: convit cing the Chinese to use the 
Pinxxiee. Ciltural pride, respect for the 
ancestors who contributed to the heritage 
of the Chinese script and the force of habit 
of thousand s of years are involved. Tien 
admits this won't be an easy problem to 
solve, but h ■ is working on it. in the case of 
the silent suffixes, for example, he has at¬ 
tempted. w len possible, to use Latin let¬ 
ters that at 'east look something like the 
original Han radicals. 

Tien is also working through formal sci¬ 
entific char nels and the Chinese govern¬ 
ment. He described the Pinxxiee system 
last fall in long Kong at an international 
computer c onference and has been deal¬ 
ing for several years with the Chinese 
Ministry of Education. He has published a 
two-volum* Lnglish-Han-Pinyin-Pinxxiee 
dictionary that contains almost 12.000 
computers oded words. “This is just the 
beginning,' he says, and quotes Lu Xun: 
"Shall we sacrifice ourselves for the 
ideographor shall we sacrifice the 
ideographs for ourselves? All but the in¬ 
sane can a iswer this immediately" □ 
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Yannd 


. yn(-ynd) 
Buut* 


reckon 

1 123-000. 

edit 

1 827-000. 

Obituary 

1-831-000. 

recognise 

1 846 000 

jeer 

1 840 000. 

expose 

1-822-300. 

disorderly 

1-823-200 

research 

1-827-100. 

let 

1-832-200. 

Enquiry 

1-883 200. 

sneer 

1-838 300 

View 

1-814 600. 

ending 

1-842-800. 

excuse 

1-222-382.420 

lecture 

1-843-300. 

record 

1-882-000. 

visiting 

1-812-840. 

explaining 

1-822-430. 

how 

1-828-820 

refrain 

1-822-830. 

chant 

1-828 460. 

amazed 

1-828-740. 

nitwit 

1-838-410. 

Suiting 

1-846-810. 



